In this tutorial, we’ll explore how to use R Studio as a mapping application, by drawing on the tmap package (among other packages). In our applied example, we’ll focus on taking an indicator from the World Bank’s Development Indicators (a widely used collection of cross-national time-series data on economic development), and displaying this indicator on a world map. More specifically, we will be making a choropleth map; choropleths are only one of many different kinds of maps out there, but they are widely used in the social sciences, since so much of our data is connected with enumeration units (like census tracts, counties, countries etc.). For a good primer on choropleth maps, please see this guide.
In working through the example below, you will become familiar with the basic mechanics of a simple, yet fundamentally important GIS workflow: the process of joining a conventional tabular dataset (that is, a dataset without explicit spatial coordinates attached to it) to a spatial dataset (that is, an explicitly georeferenced dataset), with a view towards displaying information from the tabular dataset on the geographic boundaries delineated by the spatial dataset.
Why map the information in tabular datasets to begin with? To some extent, for the same reason we attempt to make a visualization of any kind. It can help to communicate information; it can help us identify patterns we wouldn’t have otherwise noticed; and it can suggest novel hypotheses. More specifically, making a map might alert us to the spatial dimensions of the social phenomenon we are investigating, which might affect our methodological or substantive approach to a given research problem. Just as good social scientists try to contextualize their social process or outcome of interest in time, it is also important to contextualize it in space; making a map is often the first step in this process.
It is important to underscore that this tutorial is not an introduction to cartography, which is an art unto itself, and beyond our scope. However, the tutorial will show you how to customize maps in R, and you can use these skills (after reading up on cartography on your own) to make maps that conform to sound cartographic design principles. Our goal here is not to make works of art, but rather to learn how to quickly create functional and informative choropleth maps that are useful to social scientists.
For more on principles of cartographic design, a good place to start is this accessible guide from Axis Maps. As you begin to make different types of maps (or more sophisticated ones, with more moving parts than a standard choropleth map), you may want to begin with these resources from Esri: Design Principles for Cartography and Make Maps People Want to Look At, both by Aileen Buckley.
R is an open-source programming language for statistical computing that allows users to carry out a wide range of data analysis and visualization tasks (among other things). One of the big advantages of using R is that it has a very large user community among social scientists and statisticians, who frequently publish R packages. One might think of packages as workbooks of sorts, which contain a well-integrated set of R functions, scripts, data, and documentation; these “workbooks” are designed to facilitate certain tasks or implement given procedures. These packages are then shared with the broader community, and at this point, anyone who needs to accomplish the tasks to which the package addresses itself can use the package in the context of their own projects. The ability to use published packages considerably simplifies the work of applied social scientists using R; it means that they rarely have to write code entirely from scratch, and can build on the code that others have published in the form of packages. This allows applied researchers to focus on substantive problems, without having to get too bogged down in complicated programming tasks.
In the context of this tutorial, generating choropleth maps based on the World Development Indicators (WDI) data would be relatively complex if we had to write all our code from scratch. However, because we are able to make use of mapping and visualization packages written by other researchers, the task is considerably simpler, and will not require any complicated programming.
In order to process our data and make our maps, we will use a variety of packages. They are:
To install a package in R, pass the name of the package (within quotation marks) to the install.packages() function. For example, let’s say you don’t have tmap installed. You can install it with the following:
# Installs tmap packages
install.packages("tmap")
A function is essentially a programming construct that takes a specified input, runs this input (called an “argument”) through a set of procedures, and returns an output. In the code block above, the name of the package we wanted to install (here, “tmap”) was enclosed within quotation marks and passed as an argument to install.packages; this effectively downloaded the tmap package to your computer.
Repeat that process for any packages you don’t have installed.
After all the packages are downloaded, we must load them into memory. We can think of the process of loading installed packages into a current R environment as analogous to opening up an application on your phone or computer after it has been installed (even after an application has been installed, you can’t use it until you open it!). To load (i.e. “open”) an R package, we pass the name of the package we want to load as an argument to the library() function. Below, we load all of the required packages into memory:
# Loads required libraries
library(WDI)
library(sf)
library(tmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(tidyverse)
library(grid)
At this point, the packages are loaded and ready to go! One important thing to note regarding the installation and loading of packages is that we only have to install packages once; after a package is installed, there is no need to subsequently reinstall it. However, we must load the packages we need (using the library function) every time we open a new R session. In other words, if we were to close R Studio at this point and open it up later, we would not need to install these packages again, but would need to load the packages again (3.5).
Note that the codeblocks in this tutorial usually have comments, prefaced by a hash (“#”). When writing code in R (or any other command-line interface) it is good practice to preface one’s code with brief comments that describe what a block of code is doing. Writing these comments can allow someone else (or your future self) to read and quickly understand the code more easily than otherwise might be the case. The hash before the comment effectively tells R that the subsequent text is a comment, and should be ignored when running a script If one does not preface the comment with a hash, R wouldn’t know to ignore the comment, and would throw an error message.
Now that we’ve taken care of these preliminary steps, let’s go ahead and load our data into R Studio. Below, we’ll first load in a spatial dataset of world boundaries, and then read in our World Bank dataset using the WDI package.
Before proceeding, however, it is useful to briefly consider the concept of object asssignment, which will make the subsequent sections easier to follow. Consider the following example:
# assign value 5 to new object named x
x<-5
In the code above, we used R’s assignment operator (<-, i.e. a left-facing arrow) to assign value 5 to an object named “x.” Now that an object named “x” has been created and assigned the value 5, printing “x” in our console (or printing “x” in our script and running it) will return the value 5:
# Print value assigned to object "x"
x
## [1] 5
More generally, the process of assignment effectively equates the output created by the code on the right side of the assignment operator (<-) to an object with a name that is specified on the left side of the assignment operator. Whenever we want to look at the value assigned to an object (i.e. the output created by the code to the right side of the assignment operator), we simply print the name of the object in the R console (or print the name and run it within a script).
While the example above was very simple, we can assign virtually any R code, and by extension, the data structure(s) generated by that code (such as datasets, maps, graphs) to an R object. Indeed, we’ll use the basic principle of object assignment introduced above to assign the datasets we’ll import below to new objects. Note that object names are arbitrary and could be virtually anything, but it is good practice for object names to describe their contents. If this is new, it will begin to make more sense as we go.
When working with spatial data in R, we will sometimes want to import data that is stored on our computer. There are several functions in the sf package that will allow us to easily import saved or downloaded spatial data into R; the most commonly used sf package function to load saved spatial vector data into R is the st_read() function. For more details, please consult the st_read() function’s documentation by typing ?st_read().
In our case, however, we won’t have to download and import the spatial data we need into R Studio from our computer’s local drive. That is because there are R packages that already provide this spatial data, and allow us to directly load it into memory. In particular, we’ll use the ne_countries() function of the rnaturalearth package to bring a spatial dataset of country borders into our R environment, and then assign it to an object that we will name country_boundaries:
# Brings spatial dataset of country boundaries into R environment using the rnaturalearth package, and then assigns this spatial dataset to an object named "country_boundaries"
country_boundaries<-ne_countries(scale="medium", returnclass="sf")
Note the two arguments we pass to the ne_countries function: the “scale” argument specifies that we want to use a medium scale when rendering the map (the other options are ‘small’ and ‘large’), while the “returnclass” argument specifies that we want the spatial dataset as an sf object.
Now that we have our spatial dataset loaded into our R environment and assigned to country_boundaries, let’s open up this dataset and see what it looks like. The best way to view a dataset in R studio is to pass the name of the relevant object to the View() function, which will open up the dataset in R Studio’s built-in data viewer.
# View "country_boundaries" data in R Studio Data Viewer
View(country_boundaries)
tm_shape(country_boundaries)+
tm_polygons()
WDI_variables<-WDIsearch()
View(WDI_variables)
WDI_variables %>% datatable(extensions=c("Scroller", "FixedColumns"), options = list(
deferRender = TRUE,
scrollY = 350,
scrollX = 350,
dom = "t",
scroller = TRUE,
fixedColumns = list(leftColumns = 3)
))
trade_gdp_string<-"NE.TRD.GNFS.ZS"
trade_gdp_2010_2018<-WDI(country="all",
indicator=trade_gdp_string,
start=2010,
end=2018,
extra=T)
trade_gdp_2015<-trade_gdp_2010_2018 %>%
filter(year=="2015")
View(trade_gdp_2015)
trade_2015_spatial<-full_join(country_boundaries, trade_gdp_2015,
by=c("iso_a3"="iso3c"))
View(trade_2015_spatial)
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
trade_2015_spatial<-trade_2015_spatial %>%
rename(trade_pct_gdp=NE.TRD.GNFS.ZS)
tm_shape(trade_2015_spatial)+
tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
n=5)
## Warning: The shape trade_2015_spatial contains empty units.
## Make map with inset map
unique(trade_2015_spatial$subregion)
## [1] "Caribbean" "Southern Asia"
## [3] "Middle Africa" "Southern Europe"
## [5] "Northern Europe" "Western Asia"
## [7] "South America" "Polynesia"
## [9] "Antarctica" "Australia and New Zealand"
## [11] "Seven seas (open ocean)" "Western Europe"
## [13] "Eastern Africa" "Western Africa"
## [15] "Eastern Europe" "Central America"
## [17] "Northern America" "South-Eastern Asia"
## [19] "Southern Africa" "Eastern Asia"
## [21] "Northern Africa" "Melanesia"
## [23] "Micronesia" "Central Asia"
## [25] NA
trade_2015_spatial_sea<-trade_2015_spatial %>%
filter(subregion=="South-Eastern Asia")
y<-tm_shape(trade_2015_spatial_sea)+
tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
n=5)
y
library(grid)
x<-tm_shape(trade_2015_spatial)+
tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
n=5)
x
## Warning: The shape trade_2015_spatial contains empty units.
print(y, vp = viewport(0.8, 0.27, width = 0.5, height = 0.5))
library(grid)
x<-tm_shape(trade_2015_spatial)+
tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
n=5)
x
## Warning: The shape trade_2015_spatial contains empty units.
print(y, vp = viewport(0.8, 0.18, width = 0.3, height = 0.3))
## Legend labels were too wide. The labels have been resized to 0.45, 0.45, 0.41, 0.38, 0.38. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.
## Map change over time
trade_gdp_2010_2015<-trade_gdp_2010_2018 %>%
filter(year=="2010"|year=="2015")
trade_gdp_2010_2015_wide<-trade_gdp_2010_2015 %>%
pivot_wider(names_from=year, values_from=c(NE.TRD.GNFS.ZS))
trade_gdp_2010_2015_wide<-trade_gdp_2010_2015_wide %>%
rename("trade_gdp_2015"="2015") %>%
rename("trade_gdp_2010"="2010")
trade_gdp_2010_2015_wide<-trade_gdp_2010_2015_wide %>%
mutate("2015_2010_pctpt_difference"=(trade_gdp_2015-trade_gdp_2010))
trade_gdp_2010_2015_wide_spatial<-full_join(country_boundaries, trade_gdp_2010_2015_wide,
by=c("iso_a3"="iso3c"))
tm_shape(trade_gdp_2010_2015_wide_spatial)+
tm_polygons(col="2015_2010_pctpt_difference", palette="Greens", style="jenks",
n=5, midpoint=T)
## Warning: The shape trade_gdp_2010_2015_wide_spatial contains empty units.
https://ecodiv.earth/post/creating-a-map-with-inset-using-tmap/